Optimized single-cell RNA sequencing protocol to study early genome activation in mammalian preimplantation development

Summary Here, we present a modification of single-cell tagged reverse transcription protocol to study gene expression on a single-cell level or with limited RNA input. We describe different enzymes for reverse transcription and cDNA amplification, modified lysis buffer, and additional clean-up steps before cDNA amplification. We also detail an optimized single-cell RNA sequencing method for handpicked single cells, or tens to hundreds of cells, as input material to study mammalian preimplantation development. For complete details on the use and execution of this protocol, please refer to Ezer et al.1


SUMMARY
Here, we present a modification of single-cell tagged reverse transcription protocol to study gene expression on a single-cell level or with limited RNA input. We describe different enzymes for reverse transcription and cDNA amplification, modified lysis buffer, and additional clean-up steps before cDNA amplification. We also detail an optimized single-cell RNA sequencing method for handpicked single cells, or tens to hundreds of cells, as input material to study mammalian preimplantation development. For complete details on the use and execution of this protocol, please refer to Ezer et al. 1

BEFORE YOU BEGIN
Single-cell Tagged Reverse Transcription (STRT-Seq) was one of the first multiplexed single cell RNA sequencing (scRNAseq) methods focusing on the 5 0 end of mRNA, enabling the study of transcription start sites (TSS). 2,3 Since its introduction, the method has been modified to work on the new sequencing platforms such as Illumina NextSeq system, as well as red blood cell-containing, globin-rich tissue samples. 1 For these protocols, 20-40 ng of template RNA is required, making them suboptimal for the limited amount of mRNA that is present in single cells. Even though high-throughput scRNAseq platforms are now available, 4 they are limited to situations where at least hundreds or thousands of cells are available, making them unsuitable for analyzing, e.g., mammalian oocytes and embryos.
It is estimated that a human oocyte contains 55 pg of mRNA. 5 Therefore, we modernized the previous STRT protocols to improve the sensitivity of the method and named the new protocol STRT-N. Our modifications include different reverse transcriptase and cDNA amplification enzymes, modified cell lysis buffer, and additional clean-up step prior to the cDNA amplification.
A STRT-N library consists of 48 barcoded cDNA samples that can be a single cell or a single embryo that consists of 100 cells at blastocyst stage, avoiding batch effects between the samples. In addition to highly consistent quantification of expression, the reads are obtained from the 5 0 end of mRNA, thus allowing the detection of transcription start sites (TSS) and alternative gene promotors. In addition, if reads obtained from the preimplantation embryos are at the 3 0 -UTR of the known protein coding genes, then those are likely degraded maternal transcripts and not translated into proteins. However, most RNA sequencing methods cannot distinguish whether reads at the 3 0 -UTR are derived from intact transcripts or from degraded ones. This plays a vital role in preimplantation development studies.
The library preparation and quality control checks take 2 days. It is possible to pause the library preparation at several points of the protocol as well as to prepare multiple libraries in parallel. Guidelines for batch effect removal for accurate comparisons have been delineated if samples are processed in several libraries. 6

Institutional permissions
The mouse embryos were created under the ethical approval ID 1528 (Jose Inzunza) evaluated and accepted by Linkö ping's Animal Experiment Ethics Committee.

Workspace setup
The library preparation should be done in two separate areas for pre-PCR (until step 26) and post-PCR with separate consumables and small equipment in each area. This is necessary to avoid the contamination of the RNA as well as the presence of RNAase and DNAase in the pre-PCR area to ensure the good quality of mRNA. Cell lysis buffer and capture buffer should be prepared in pre-PCR area.

MATERIALS AND EQUIPMENT
CRITICAL: Prepare all the buffers and solutions in the pre-PCR area, preferably in a PCR workstation or RNA clean room, and use nuclease-free water. We advise that pre-and post-PCR areas are clearly separated, with the consumables and small equipment being also separated. If barcoded primers are ordered in dry condition, dissolve each primer in nuclease free water to make 100 mM stocks (store at À20 C. Stable up to 12 months). Prepare 48 well plate with 10 mM barcoded primer solutions in numerical order by adding 10 mL of the primer in each well and 90 mL of nuclease free water. Store at À20 C (stable up to 12 months). Before library preparation, place the barcoded primers plate at +4 C to be thawed for 3-4 h.

PCR2 primer mix for step 57 based on Ezer et al. (2021)
Timing: 20 min To prepare primer mix for the step 57, mix the following reagents in 1.5 mL tube: Make aliquots of 20 mL and store at À20 C for 12 months. This buffer is used to make ERCC Spike-In dilution 1:10 000, for the step 34, and for step 42 and steps that follow. CRITICAL: It is essential that the plate is placed on dry ice as soon as possible after placing the embryos to the lysis buffer. The plate should be placed on dry ice even if the library will be prepared on the same day, as this step helps the cell lysis.

STEP-BY-STEP METHOD DETAILS
Note: We used the whole embryos to create the library we demonstrated in this protocol. If you want to use embryonic single cells, embryo biopsy can be performed by zona drilling pipette and then blastomere extraction. Blastomere should then be placed into a PBS solution, hand-picked and transferred to one well in the cell lysis plate.
Pause point: Plates can be stored at À80 C for 12 months. Reverse transcriptase reaction (day 1)

Timing: 2 h 30 min
This is the first part of the protocol and describes the steps involved in capturing mRNA and performing the first DNA strand synthesis.
2. Take out the capture cell plate from the À80 C (previously stored for maximum of 6 months) and keep it on dry ice. 3. Preheat the PCR block to 80 C and transfer the plate directly from dry ice to the PCR block. 4. Hold hand on the plate lid, and once it is warm, close the lid of the PCR machine and denature RNA for 2 min (RNA Denaturation step). 5. Place the plate immediately on ice. 6. Prepare the reverse transcriptase mix using following reagents in 1.5 mL Eppendorf LoBind tube. 7. Add 1 mL of ERCC Spike-In mix (1: 10,0003 dilution). 8. Vortex the mix and spin down quickly. 9. Divide the mixture to 8 tube strip for an easier pipetting of mixture to the plate. 10. Add 6.5 mL to each well of the plate using an 8-channel pipette. 11. Quickly vortex and spin down plate either in the plate centrifuge (20 s at 1500 3 g) or spin the individual strips on the table centrifuge. 12. Place the plate in the PCR machine and start the reverse transcription (RT) reaction. Figure 1. Continued 3 cytosine molecules on the 5 0 end of mRNA. Oligo TSO-8UMI is used to bind to those cytosine molecules, promoting template switching and introduction of UMIs into the cDNA. cDNA is then cleaned, and the barcodes are introduced at the 5 0 end. After the cDNA amplification, each sample is barcoded, and all the reactions can be pooled for the next steps which include purification, fragmentation, and adapter cassette ligation. The library preparation is finished by the further 5 0 end amplification before sending the ready library for sequencing. Adapted from Ezer  CRITICAL: It is highly important that the RT reaction is mixed thoroughly before starting the PCR cycles.
Clean up and full cDNA amplification (day 1)

Timing: 4 h
This section describes the clean up steps of reverse transcriptase reaction to remove all the unnecessary reaction elements and minimize the byproduct formations in the cDNA amplification, as well as introduction of barcoded primers to each sample and performing the cDNA amplification.
14. In a 1.5 mL tube mix 50 mL Dynabeads MyOne Carboxylic Acid beads and 400 mL capture buffer (previously stored for maximum of 6 months at 21 C-22 C) 15. Add 7 mL of magnetic beads and capture buffer mix to the RT product. Mix well by pipetting or quick vortex (vortex together with the plate holder, so that it reduces the liquid spilling to the caps). 16. Incubate for 15 min at room temperature (21 C-22 C) and place the plate on a 96 well magnetic rack for 3 min. 17. Remove all supernatant by pipetting. 18. Spin the plate or take strips apart and spin down on the table centrifuge and place back on the magnetic rack. Remove all traces of the supernatant. 19. Resuspend the beads in 22 mL nuclease-free water. 20. Add 1.5 mL of barcoded primers (10 mM) (previously stored at À12 C for up to 12 months). 21. Prepare PCR master mix in 2 mL Eppendorf DNA LoBind tube. 22. Vortex thoroughly and spin down the tube.   Note: It is possible to visualize the cDNA products on a 2% TAE gel, if the library isn't planned to be sequenced ( Figure 3).

Timing: 30 min
This section describes pooling together all the samples to one reaction as they have already been barcoded in the previous steps and purifying the reaction.
26. Place the plate on the 96 well magnetic rack and wait for 3 min for beads to bind. 27. Collect the clear supernatants into six 1.5 mL LoBind Tubes. Purify the library with PCR Clean-up column (NucleoSpin Gel and PCR Clean-up/Macherey-Nagel-https://www.mn-net.com/media/ pdf/02/1a/74/Instruction-NucleoSpin-Gel-and-PCR-Clean-up.pdf) Note: Purification should be done according to the manufacturer's instructions, except for the step 4: silica membrane should be centrifuged for 2 min instead of 1 min; for step 5: preheat the elution buffer provided in the kit at 70 C for 5 min.
28. Elute the cDNA twice using 16 mL each time of the previously heated elution buffer. At the end there should be 32 mL of cDNA in the clean 1.5 mL LoBind tube. 29. Take out 2 mL of the library for the 1 st quality check with TapeStation-QC1. There is now 30 mL of library left for the next step.
Pause point: At this point in experiment, it is possible to pause by placing the library at -20 C for up to 1 month.

Fragmentation (day 2)
Timing: 40 min This section described the fragmentation of DNA to achieve the wanted length (300-350bp) of DNA that can be sequenced on the Illumina platform. Note: Purification should be done according to the manufacturer instructions, except for the step 4: silica membrane should be centrifuged for 2 min instead of 1 min.
39. Elute the cDNA with 32 mL of elution buffer, provided in the kit, and take 2 mL aside for the 2 nd quality check with TapeStation (QC2).
CRITICAL: It is important that while assembling the holder for sonication, 11 extra 0.65 mL BioRuptor shearing microtubes containing 100 mL of water are placed together with the one tube containing 100 mL of cDNA, to ensure the equivalent sonication.
Note: Fragmentation step can be done using other compatible machines such as Covaris Focused ultrasonicator, by using 130 mL microtubes and target 300 bp using manufacturer's protocol.
Pause point: At this point in experiment, it is possible to pause by placing the library at À20 C for up to 1 month.

Timing: 4 h
This section describes the preparation of DNA for the sequencing. Following the fragmentation, the DNA is cut at the random points, ends of the DNA must be repaired to be able to ligate the adaptors

Final purification (day 2)
Timing: 20 min This section describes the final library purification needed to remove the unwanted residues of the primers and reaction elements before the library can be sequenced. Pause point: Store the sequencing ready library at À20 C for up to 6 months.

Library quality control (day 3)
Before the library can be send for the sequencing the final quality control should be performed to determine the library concentration which is essential for the sequencing.
67. Check the concentration of the ready library from step 66 on the TapeStation D1000 ScreenTape system. The concentration of the ready library should be 2-30 nM and peak should be around 300 bp (see Figure 4 for expected quality check results).

Reagent Amount
Beads containing cDNA in Buffer EB (from step 57) 20 mL  CRITICAL: It is important that the correct library concentration is determined before sequencing as the volume of library loaded for the sequencing is decided based on that. If the concentration between the TapeStation and KAPA kit measurement is different, we have relied on the results provided by KAPA kit.

Sequencing (day 3)
Sequencing of STRT-N libraries can be performed on the Illumina NextSeq System. While read 1 is 75+ cycles of non-index read, read 2 must be exactly 6 cycles of index read as the sample barcode is 6 bp. Download the base call (BCL) files; demultiplexing and conversion to FASTQ files are unnecessary. The location of the ''BaseCalls'' folder in the files will be specified as BaseCallsDir_PATH at step 72. ''RunInfo.xml'' in the BCL files records the number of cycles.
Note: We usually sequence the libraries using NextSeq 500 with custom primer STRT2seq (Table 1) and High Output v2.5 kit, 75 cycles at the Biomedicum Functional Genomics Unit (FuGU). In the BCL files, there should be the ''BaseCalls'' folder at Data/Intensities. Although we use the 75 cycles kit, 86 and 6 cycles of sequencing are performed for reads 1 and 2 respectively. Therefore, we usually describe the logical structure as 8M3S75T6B (for step 72a; 8 + 3+75 = 86); change the structure when the length of read 1 is shorter than 86 cycles. The read1 sequencing read length is 86 bp while the read2 sequencing length is 6 bp.
Preparations for data processing This section includes the minimal hardware requirements, installing conda packages and pipelines as well as details of required genome indexes to be processed by STRT-N bioinformatics data processing pipeline. The installation of conda packages and pipeline and running the STRT-N pipeline should be run on Linux terminal. We tested the pipeline on Windows Subsystem for Linux running Ubuntu 20.04.3 in a Windows 11 laptop, and on the Puhti Linux cluster at the IT Center for Science (CSC) services (https://docs.csc.fi/computing/systems-puhti/).

Hardware requirements
Building genome indexes (step 71), sequence data processing (step 72), visualization on UCSC (step 73-74) and visualization using Seurat (step 75) require about 150GB, 15GB, 50MB and 150MB of memory depending on genome size and raw data size.

Installing packages with conda and pipelines
This section includes the installation of required dependencies and pipeline installation. 69. Create conda environment, install the required software packages in the yaml file and activate them as following.
Note: In this example, the required software packages will be prepared the environment named ''STRTN-test'', but you may change it to any name according to your project.
70. Install the main pipeline and visualization of data pipeline from the following GitHub repository.
Note: In this example, the pipeline will be cloned into the ''STRTN-test'' folder, but you may change it to any name according to your project.

Preparing genome indexes and sequence dictionary
Timing: $3 h using 8 CPU cores 71. Prepare HISAT2 genome indexes and sequence dictionary in a platform which has big memory to rapidly and efficiently access reference sequence information. Here, we illustrate the case for house mouse genome mm39. a. Obtain the genome sequences of reference and ERCC spike-ins.
b. Extract splice sites and exons from the annotation file using HISAT2 (v2.2.1) 9 Here we used wgEncodeGencodeBasicVM30 as the annotation file.
c. Build the HISAT2 indexes of the mouse reference genome and ERCC Spike-in RNAs with hi-sat2-build function. xii. BigBed file for coding-5 0 end annotation file, named as coding coding_5end.bb Note: Spike-in_5end_rate represents the proportion of STRT reads aligned to the 5 0 end (50 nt) of spike-in RNAs. And coding-5end_rate represents the proportion of STRT reads aligned within the 5 0 UTR of the protein-coding genes and the proximal (500 nt) upstream. These values must be high and stable enough, as a decrease in the values shows poor library synthesis quality. However, low or unstable coding-5end_rate is sometimes acceptable when it is expected, for example in the study of preimplantation embryos in which maternal transcripts are degrading. The assessment of whether values are good or bad depends on factors such as sample type and quality. For example, we found the spike-in 5 0 end rate is 85%-95%, while the coding 5 0 end rate is 45%-65% in a successful mouse library.
72. Run the STRT-N main pipeline with the following command (see an example run). Please make sure that barcode sequence with barcode name (  Here, we describe Step-by-step methods from demultiplexing, read mapping, marking duplicates, annotation, quality check, quantification to creation files for visualization as well as optional analyzes. a. To correctly separate and analyze the sequencing data based on barcodes, the barcodes are prepared, the number of lanes are determined and the necessary parameters file for barcodes are created. To understand which sequences came from which samples, the sequencer raw base call data files are demultiplexed with Picard ExtractIlluminaBarcodes and IlluminaBasecallsToSam. This step processes raw base call and related files which are binary reporting (InterOp) and cluster location (LOC) and creates unaligned BAM files based on the well-specific barcodes (Table 2 1 ).
Note: It is necessary to convert the sequencer output format to the downstream formats like SAM/BAM by defining the read structure. The input data consists of 92 base clusters (cycles) and we described this logical structure as 8M3S75T6B: 8 cycles (bases) of molecular barcode, 3 cycles skipped, 75 cycles of the template, and 6 cycles of sample barcode.
b. To align reads to GRCm39 reference genome and ERCC Spike-in RNAs (SRM 2374) [NIST SRM 2374 Certificate of Analysis, https://www-s.nist.gov/srmors/certificates/2374.pdf, 2013] with the Gencode annotation file (wgEncodeGencodeBasicVM30) as a guide of exon junctions. The unaligned BAM files are sorted and converted to FASTQ files with Picard SamToFastq and aligned using HISAT2. c. To generate a unique molecular identifier (UMI)-annotated BAM files, the aligned BAM files are merged with the original unaligned BAM files by Picard MergeBamAlignment. d. To merge all lanes, UMI-annotated BAM files corresponding to each sample derived from four lanes are merged using Picard MergeSamFiles and to mark the potential PCR duplicates, Picard MarkDuplicates is used. e. To check the quality of reads, information on genomic regions is extracted from annotation files. After that, the resulting BAM files per sample are indexed and the reads that do have not primary alignment and PCR duplicate flags 256 and 1024 are removed using SAMtools (v1.6). 10 Reads mapped to protein-coding genes and spike-in RNAs and 5 0 ends of them, are counted using BEDtools (v2.30.0). 11 The values of i) log10 of total mapped read counts, ii) mapping rates, iii) log10 of Spike-in RNA read counts, iv) ratios of total mapped read counts versus Spike-in RNA read counts, v) 5 0 end capture rates of Spike-in RNAs, and vi) 5 0 end capture rates of protein-coding genes are plotted with R (4.2.0) 12 package ggplot2 (v3.4.0) 13 the outlier samples are marked with barcode numbers ( Figure 5). Please consider these outlier samples for the further downstream analysis in accordance with your scientific question. As an example, we decided to remove outlier samples from the successful mouse library to make a reliable data representation because these samples might have degraded. f. To quantify gene expression level, Subread featureCounts (2.0.1) 14 is used to count reads align to 5 0 end of genes with parameters '-s 1 -largestOverlap -ignoreDup -primary'. In this step, uniquely mapped reads within 5 0 UTR or 500 bp upstream of the protein-coding genes and the first 50 bp of spike-in sequences are counted. g. To visualize read alignments, resulting output BAM files are used. The BAM files are indexed and the reads that do have not primary alignment and PCR duplicate flags 256 and 1024 are removed using SAMtools with below commands for each sample. h. To visualize the number of reads as a continuous signal, BigWig files are created. First, BedGraph files are created from BAM files by calculation of scale factor with running bin/cal-culationScaleFactor.R for each sample using bedtools genomecov. Followed by, the BedGraph files are converted into BigWig files for forward and reverse strands for each sample with size files using bedGrapthToBigWig. i. To visualize annotation items, BigBed file from coding-5 0 -end.bed file is created. The output coding-5 0 -end BED file is sorted and converted into BigBed file with the chromosomal size file using bedToBigBed with below commands.
Optional: Moreover, one can perform as abovementioned gene-based analysis by running STRTN.sh to find only reads within the 5 0 UTR or the proximal upstream of protein-coding genes. Besides, one can perform TFE (transcript far 5 0 end)-based analysis by running STRTN-TFE.sh. This analysis is developed originally by Tö hö nen et al., 20 and is modified to run in CSC and on a laptop based on TFE analysis in Ezer et al. 1 Here, TFEs are defined as the first exon (5 0 end region) of assembled STRT-N reads mapped to reference genome using StringTie (v2.1.7) 17 and assigned with unique IDs and gene expression levels quantified as described above. The details are available at https://github.com/gyazgeldi/ STRTN/blob/master/STRTN-TFE-README.md.
Note: If TFE-based analysis is planned to perform, please make sure to add the option '-dtá (downstream-transcriptome-assembly) for the main pipeline STRTN.sh or STRTN-CSC.sh, which is required in the HISAT2 alignment process, it helps the transcript assembly by StringTie.
Optional: One can perform sequence quality check analysis of sequenced data by running fastq-fastQC.sh or fastq-fastQC-CSC.sh. Here, the output BAM files are converted into fastq files (without duplicated reads) that can be submitted to public sequence databases. Followed by, FASTQC files are generated for each fastq file, and based on the FASTQC results, MultiQC report is generated.

Visualization the results in UCSC genome browser
Timing: $30 min This section describes how to visualize the results on genome browsers, for example UCSC Genome Browser tool 21 or Integrative Genomic Viewer (https://igv.org/). Hereinafter, we describe steps for visualization of data in UCSC using BAM files for alignments, BigWig files for read frequency and BigBed file of coding-5 0 end for annotation items. The detailed procedure of visualization pipeline is in https://github.com/gyazgeldi/STRTN/blob/master/Visualization-in-UCSC-README.md. Overall, the pipeline takes resulting BAM files, BigWig files and BigBed file of coding-5 0 end as the inputs and gives fully automatically following output files. from successful STRT-N mouse library 75. The resulting output count matrix is used to create Seurat object with CreateSeuratObject function and the data is normalized based on spike-in normalization with NormalizeData function. We normalized measured values based on spike-in normalization because in preimplantation development, RNA amount is changing at the different stages according to ref. 22 Hereby, the default per-million based normalization is invalid for this STRTN library analysis. Therefore, we added +1 value to all expression values and divided the values by sum of spike-ins expression values (see below). Differences between the expression profiling with spike-in normalization and without spike-in normalization is demonstrated in Figure 6. We highly recommend spike-in normalization to enables accurate comparisons of expression levels between samples. Followed by, as an example, we visualized resulting data from mouse STRT-N library (Figure 7).

EXPECTED OUTCOMES
Expected results of the library quality control check points can be observed in the Figure 4. The concentration of the ready library should be 2-30 nM. Figure 3 shows how cDNA products look on 2% TAE gel after the cDNA amplification step 24. There should be smear on the gel.
From one sequencing kit, we obtained 181,843 -12,756,053 reads per sample with a median of 4,205,390. After excluding duplicated reads, we obtained 50,793-8,392,691 reads per sample with a median of 2,436,566 for a mouse library (2.0 pM of library and 20% of PhiX). The values of log10 of total mapped read counts, log10 of Spike-in RNA read counts, 5 0 end capture rates of Spike-in RNAs, mapping rates, ratios of total mapped read counts versus Spike-in RNA read counts and 5 0 -end capture rates of protein-coding genes are expected at least 6.0, 2.5, 80%, 40%, 5000 and 50% as respectively depending on annotation status of relevant model organism. These values can be found in a plot in the out directory named as OUTPUT-QC-plots.pdf. The expected plot can be observed in Figure 5, obtained from a successful library. In addition, we obtained 14,469 proteincoding gene that have at least five mapped reads based on gene-based analysis from this library. While 15,031 protein-coding gene and non-protein coding gene that have at least five mapped reads were obtained after excluding intronic and unannotated TFEs based on TFE-based analysis from this library.

LIMITATIONS
This method is ideal for the samples with very limited availability, such as mammalian embryos. The library consists of only 48 samples; thus, it is not suitable for cell lines from which enough RNA can be extracted. In this case bulk STRTseq would be better 3 if the aim of your study is looking at the different TSS, the STRTseq is superior in the coverage of 5 0 end as shown in Abugessaisa et al. 23 In addition, as this is 5 0 end method, data about splice isoforms and 3 0 UTR isoforms are not available.

Potential solution
Make sure that all the MyOne Carboxylic acid beads are well resuspended in the water before adding the barcoded primers and PCR mix. In case the beads are stuck to the walls of the well, we recommend spinning down the plate before resuspending the beads.

Problem 2
Dominant peak at around 100 bp and little or no other size products in QC1.

Potential solution
During the PCR the primers can form dimers that can take over the reaction.
Make sure that supernatant in step 17 is removed completely. If this peak is not dominant and there is also enough cDNA, then the results will not be affected as these smaller products will be removed in the purification steps.

Problem 3
No wide peak around 350 bp in QC2.

Potential solution
It is possible that fragmentation didn't work efficiently. It is necessary to repeat the fragmentation step as follows: Measure the exact amount of cDNA that is left after taking the 2 mL for QC2. Add AMPure XP beads to the sample in the ratio of 0.653 (this is to capture the larger fragments that haven't been fragmented), mix well and incubate for 10 min on the bench. Put the tube on the magnetic rack and wait for 3 min until solution is clear. Transfer the supernatant to different tube (this is to protect already fragmented shorter products from over fragmenting them) and place the tube in the fridge until the fragmentation of longer products has been done again. Remove the tube from the magnetic rack and add 100 mL of Buffer EB and wait for 1 min. Place the tube again on the magnetic rack and wait until solution is clear. Transfer this solution to new 0.65 mL BioRuptor microtube and perform the fragmentation following the steps 36 and 37. Before purifying the content as in step 38, mix the solution from the tube that you kept in the fridge that contains shorter products with the solution in the 0.65 mL BioRuptor tube. Perform the clean up as in step 38 and repeat QC2.

Problem 4
The concentration of the final library is low.

Potential solution
The final library amplification (step 58) the number of PCR cycles can be increased by 1-2 more cycles.

Problem 5
The mean quality value of base positions at the end of the read is low.

Potential solution
If the mean quality value of base positions at the end of the read is low in the sequence quality histogram in the MultiQC report, reads can be trimmed by re-defining the read structure to use in demultiplexing step using Picard tool.

Problem 6
Low Spike-In concentration in the sequencing library.

Potential solution
Spike-In concentration should not be lower than 1000 copies per sample. We recommend the range to be between 1000-10 000 per sample. Based on this, calculate the initial Spike-In dilution that you will add to the library.

RESOURCE AVAILABILITY
Lead contact Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact Nina Boskovic (nina.boskovic@ki.se).

Materials availability
This study did not generate any new reagents.

Data and code availability
The raw data (BCL files) have been deposited